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Abstract 

In this paper we propose a computational treatment 
of the resolution of zero pronouns in Japanese dis- 
course, using an adaptation of the centering algo- 
rithm. We are able to factor language-specific de- 
pendencies into one parameter of the centering algo- 
rithm. Previous analyses have stipulated that a zero 
pronoun and its cospecifier must share a grammatical 
function property such as Subject or NonSubject. 
Wc show that this property-sharing stipulation is un- 
needed. In addition we propose the notion of topic 
AMBIGUITY within the centering framework, which 
predicts some ambiguities that occur in Japanese dis- 
course. This analysis has implications for the design 
of language-independent discourse modules for Nat- 
ural Language systems. The centering algorithm has 
been implemented in an HPSG Natural Language 
system with both English and Japanese grammars. 



1 Introduction 

Japanese is a language well-known for grammaticiza- 
tion of discourse function. It is rich with ways for 
speakers to indicate the information status of the 
discourse entities they are talking about. Japanese 
allows a speaker to clearly indicate topic-hood, along 
with the grammatical functions such as subject, ob- 
ject and object2, by using the morphological case 
markers wa, ga, o, ni. In addition, it provides mor- 
phological means to indicate speaker's perspective 
through the use of verbal compounding, i.e. the ad- 
dition of suffixes such as kureta, kita (See section 
3) . Unexpressed arguments of the verb are common; 
these are known as zero pronouns. 

Because there are zero pronouns and because 
Japanese is a head-final language with otherwise rel- 
atively free word order, there could, in principle, be a 
great deal of ambiguity. However this is not the case. 
Speakers are assumed to be cooperative, to be col- 



laborating with the hearer in conversation, and to be 
ensuring that each utterance is relevant and coherent 



in the context of what was said before [Gri75, SSJ74|. 
We believe that speakers do not choose to express 
their thoughts through arbitrary syntactic construc- 
tions, but that there is some correspondence between 
choice of syntactic construction, what the speaker 
wants to convey, and aspects of the current discourse 



situation I Pri85 1 



Within a theory of discourse, CENTERING is a com- 
putational model of the process by which a speaker 
and hearer make obvious to one another their as- 
sumptions about the salience of discourse entities. 
Using pronominal referring expressions is one way 
for discourse participants to do this. We propose 
that the resolution of zero pronouns is constrained 
by centering, and ambiguity is thereby reduced. 

Centering has its computational foundations in 
the work of Grosz and Sidner |Gro7'^ , |Sid79| , |GS85f 
and was further developed by Grosz, Joshi and 
Weinstein [|GjW83| , |GJW86| , |IW8li . It is formal- 
ized as a system of constraints and rules, which 
can, as part of a computational discourse model, 
act to control inferencing| JW81 . Brennan, Fried- 



man and Pollard use these rules and constraints to 
develop an algorithm for resolving the co-specifiers 

Our analysis uses 
By making full 



of pronouns [§FP87|, |Wal89 |. 
an adaptation of this algorithm, 
use of the centering formalism, we avoid the pos- 
tulation of additional mechanisms, e.g. property 



sharing I Kam^ 



In addition, we propose a notion of topic am- 
biguity, which characterizes some ambiguities in 
Japanese discourse that are allowed by the centering 
process. Topic ambiguity has been ignored in pre- 
vious accounts of Japanese zero pronoun resolution, 
but it explains the availability of interpretations that 
previous accounts would predict as ungrammatical. 
Centering gives us a computational way of determin- 
ing when a zero pronoun may be assigned Topic. 
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This analysis informs the design of language in- 
dependent discourse processing modules for Natural 
Language systems. We propose that the centering 
component of a discourse processing module can be 
constructed in a language independent fashion, up to 
the declaration of a language-specific value for one 
variable in the algorithm, i.e., Cf list ranking (see 
section ^) . The centering algorithm has been imple- 
mented in an HPSG Natural Language system with 
both English and Japanese grammars. 



2 The Centering Formalism 

The modeling of attentional state in discourse by 
centering depends on analyzing each pair of utter- 
ances in a discourse according to a set of transitions. 
These transitions are a measure of the coherence of 
the segment of discourse in which the utterance oc- 
curs. Each utterance in a discourse has associated 
with it a set of discourse entities called FORWARD- 
LOOKING CENTERS, Cf, and a special member of 
this set called the backward-looking center, 
Cb. The FORWARD-LOOKING CENTERS are ranked 
according to discourse salience; the highest ranked 
member of the set is the preferred center, Cp. 
With these definitions we can give the constraints: 



• CONSTRAINTS 

For each Ui in a discourse segment Ui, 



1. There is precisely one Cb. 

2. Every element of Cf(Ui) must be realized|^ 
in Ui. 

3. The center, Cb(Ui), is the highest-ranked 
element of Cf(Ui_i) that is realized in Ui. 

The typology of transitions from one utterance, 
Ui, to the next is based on two factors: whether the 
backward-looking center, Cb, is the same from Ui_i 
to Ui, and whether this discourse entity is the same 
as the preferred center, Cp of Ui. Backward-looking 
centers are often pronominalized and discourses that 
continue centering the same entity are more coher- 
ent than those that shift from one center to another. 
This means that some transitions are preferred over 
others. These two facts give us the rules: 



RULES 

For each Ui in a discourse segment Ui, 



,Un 



^An utterance U (of some phrase, not necessarily a full 
clause), realizes c if c is an element of the situation described 
by U, or c is the semantic interpretation of some subpart of 
U. 



1. If some element of Cf(Ui_i) is realized as 
a pronoun in Ui, then so is Cb(Ui). 

2. Transition states are ordered. Continu- 
ing is preferred to retaining is preferred 
to shifting- 1 is preferred to shifting^. 

The transition states that are used in the rules are 
defined in Figure ||, (BACKWARD-LOOKING center 
= Cb, preferred Center — Cp). 





Cb(Ui) = Cb(Ui-i) 


Cb(Ui) / Cb(Ui_i) 


Cb(Ui) 


CONTINUING 


SHIFTING- 1 


Cp(Ui) 






Cb(Ui) 








RETAINING 


SHIFTING 


Cp(Ui) 







Figure 1: Transition States 

The centering algorithm incorporates these rules 
and constraints in addition to linguistic constraints 
on coreferencelBFP87 . The behavior of the center- 



ing algorithm for the resolution of pronouns is largely 
determined by the ranking of the items on the for- 
ward center list, Cf, because, as per Constraint 3, 
this ranking determines from among the elements 
that are realized in the next utterance, which of them 
will be the Cb for that utterance. Although all of the 
factors that contribute to the Cf ranking have not 
been det ermined, syn t ax and lexica l semantics have 
an effe ct [ ^ri8l| , |Pri85| , |HD88| , |Bre89| , |GJW86| , |JW81 , 
BF83 |. We postulate that this ordering will vary 
from language to language depending on the means 
the language provides for expressing discourse func- 
tions. Our adaptation of the algorithm for Japanese 
consists of substituting a different ranking of the for- 
ward centers list Cf. In every other way, the algo- 
rithm functions exactly as it is for English. 



3 Centering in Japanese 

In order to apply the centering algorithm to the res- 
olution of zero pronouns in Japanese, we must de- 
termine how to order the forward centers list, Cf. 
The function topic is indicated by the morpholog- 
ical marker wa, along with subject (ga), object 
(o), and OBJECT2 (ni). The optional use of wa 
picks out the most salient entity in the discourse. 
In addition, Kuno proposed the notion of empathy. 



|BFP87 introduces the distinction between SHIFTING-1 



and SHIFT NG 
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which is the pe rspectiv e from which a speaker de- 
scribes an event [ Kun73 ] . The reahzation of speaker's 
empathy is especially important when describing an 
event involving some transfer. For example, there is 
no way to describe a giving and receiving situation 
objectively! KK77 1 . In (1), the use of the past tense 
kureta of the verb kureru, indicates the speaker's em- 
pathy with the discourse entity realized in object po- 
sition^. 

(1) 

Hanako wa Taroo ni hon o kureta. 
top-sub j obj2 book obj give-past 

"Hanako gave Taroo a book. " 
EMPATHY=OBJ2=TAROO 

In (2), the speaker's empathy with the subject en- 
tity's perspective is indicated using yatta, the past 
tense of the verb yaru. 

(2) 

Hanako wa Taroo ni hon o yatta. 
top-sub j obj 2 book obj give-past 

"Hanako gave Taroo a book. " 
EMPATHY=SUBJ=HANAKO 



The use of deictic verbs such as kuru ('come' 
iku ('go') also indicate speaker's perspective. 



and 



Kuno calls a verb that is sensitive to the speaker's 
perspective an Empathy-loaded verb, and defines 
Empathy locus as the argument position whose ref- 
erent the speaker is identifying with^ Any Japanese 
verb can be made into an empathy-loaded verb by us- 
ing an empathy-loaded verb as an auxiliary, which is 
suffixed onto the main verb stem. The complex pred- 
icate made by this operation inherits the empathy- 
locus of the suffixed verb. The kureru form of ('give') 
can be used as a suffix, to mark OBJ or OBj2 as the 
empathy-locus, as can the deictic verb kuru ('come') 
The use of the suffix kureta is shown in (3). 



(3) 

Hanako wa 



Taroo ni 



hon o yonde-kureta. 
book read-gave 
"Hanako gave Taroo a favor of reading a book. " 
EMPATHY=OBJ2=TAROO 

The suffixation of verbs such as iku ('go') and the 
yaru form of ('give'), mark subject as the empathy- 
locus, e.g. itta in (4). 



(4) 

Hanako wa 



Taroo o 



tazunete-itta. 
visit- went 
"Hanako went to visit Taroo." 
EMPATHY=SUBJ=HANAKO 

The relevance of speaker's empathy to centering is 



^We use identifiers of all capital letters to denote the dis- 
course entity realized by the corresponding string. Centers are 
semantic entities, not syntactic ones. 

*The speaker does not necessarily take his/her own per- 
spective to describe an event in which s/he is involved. 



that a discourse entity realized as the empathy-locus 
is more salient, so that the empathy- locus position is 
ranked higher on the Cf . Therefore, we use a ranking 
for the Cf in Japanese that incorporates Empathy 
as follows: 

Cf Ranking for Japanese 

TOPIC > EMPATHY > SUBJ > OBj2 > OBJ 

This ranking is a slig ht variation of that proposed 



by Kameyama|Kam88 . The centering algorithm 
works by taking the arguments of the verb and order- 
ing them according to the Cf ranking for Japanese 
given above. In the cases where there are zero pro- 
nouns, there will be multiple possibilities for their 
interpretation and this will result in there being a 
priori several possible Cf list^. These Cf lists are fil- 
tered according to the centering rules and constraints 
in section If there are still multiple possibilities, 
then the ordering on transitions applies, and con- 
tinuing interpretations are preferred. 

Many cases of the preference for one interpreta- 
tion over another follow directly from the distinction 
between continuing and retaining. 

(5) 

Un: 

Taroo wa paatii ni syootai-sareta. 

party to invited-was 
"Taroo was invited to the party. " 



Cb: TAROO 
Cf: [TAROO] 



Un 





Hanako o totcmo 

very-much 
"He liked Hanako very much. " 



kiniitta. 
was-fond-of 



Cb: TAROO 




Cf: [TAROO, 


HANAKO] 


subj 


obj 



Un + 2: 

Kinoo eiga ni sasotta rasii. 

yesterday movie to invite seems 

"Seemingly he invited her to a movie. " 



Cb: 


TAROO 






Cfl: 


[TAROO, 


HANAKO] 


CONTINUING 




subj 


obj 




Cf2: 


[HANAKO, 


TAROO] 


RETAINING 




subj 


obj 





When the centering algorithm applies in (5) to 
Un+2, constraint 3 says the Cb must be the highest 
ranked element of Cf(Un+i) realized in Un+2- Be- 
cause there are 2 zeros in Un+2, TAROO must be 
realized and therefore must be the Cb. The only 
continuing interpretation available, Taroo invited 
Hanako corresponds to the forward centers list 

^A discourse entity can simultaneously fulfill multiple roles. 
The entity is ranked according to the highest ranked role. 
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Cfl. The fact that the preferred interpretation is 
the one in which the subject zero pronoun takes a 
SUBJECT antecedent is epiphenomenal. 

Example (6) demonstrates the effect of speaker's 
empathy on the saUence of discourse entities. 

(6) 

Hanako wa tosyokan de benkyoositeita. 

library in studying-was 
"Hanako was studying in the library. " 

~Ch: HANAKO 
Cf: [HANAKO] 

U„+i: 

Taroo ga Hanako o tetudatte-kureta. 

help-gave 

"Taroo gave Hanako a favor in helping her. " 



Cb: [HANAKO] 






Cf: [HANAKO, 


TAROO] 




empathy 


subj 




U„+2: 






Tugi no hi 


eiga ni 


sasotta. 


next of day SUBJ 


OBJ movie to 


invited 


"Next day she invited him to a movie." 




Cb: HANAKO 






Cf: [HANAKO, 


TAROO] CONTINUING 


subj 


obj 





In (6), HANAKO is the most highly ranked entity 
from Un+i realized in Un+2 , and therefore must be 
the Cb. The preferred interpretation will therefore 
be the she invited him... one that results from the 
more highly ranked continuing transition, in which 
HANAKO is the preferred center (Cp). 

The centering algorithm can also be applied suc- 
cessfully to intrasentential anaphora, by treating the 
subordinate clause as though it were a separate ut- 
terance for the purposes of pronoun interpretation. 
Consider: 

(7) 

Taroo wa Kim ni [0 bengosuru] koto o hanasita. 

defend comp told 

"Taroo told Kim that he would defend her" 



Cb: 


TAROO 




Cfl: 


[TAROO, 


KIM] CONTINUING 




subj /top 


obj 2 


Cf2: 


[HANAKO, 


KIM] RETAINING 




subj /top 


obj2 



The CONTINUING interpretation, Taroo told Kim 
that he would defend her, is preferred to the retain- 
ing interpretation, Taroo told Kim that she would 
defend him. 



4 Topic ambiguity 

The centering process reduces but does not neces- 
sarily eliminate semantic ambiguity in Japanese dis- 
course. Within a loosely defined context, a native 
speaker's intuitions sometimes still allow for more 
than one equally preferred interpretation of an ut- 
terance. 



4.1 Center Establishment 

In the "Introduce" example shown in (8) below, am- 
biguity arises from the combined facts that the Cb of 
Ui is neutral (undefined), and there are more entities 
on the Cf list of Ui than there are zero pronouns in 

U2. 

(8) 

Ui: 

Lyn-ga Masayo-ni Sharon-o shookaisita 
SUBJ OBJ2 OBJ introduced 

"Lyn introduced Sharon to Masayo." 



Cb: [?] 






Cf: [LYN, 


MASAYO, 


SHARON] 


subj 


obj 2 


obj 



kiniitteiru 
"Lyn likes Masayo" (Cfla) 
"Lyn likes Sharon" ( Cfl b ) 
"Masayo likes Sharon" (Cf2) 



Cbl: 


LYN 




Cb2: 


MASAYO 




Cfla: 


[LYN, 


MASAYO] 




subj 


obj 


Cflb: 


[LYN, 


SHARON] 




subj 


obj 


Cf2: 


[MASAYO, 


SHARON] 




subj 


obj 



All three of these readings of U2 are equally pre- 
ferred CONTINUATIONS. To explain this fact, we 
posit that the Cb of an initial utterance Un may be 
treated as a variable, indicated by [?], which can be 
equated with whatever Cb is assigned to the subse- 
quent utterance Un-i-i0. For example, because there 
are 2 zeros in U2 of (8) and there are 3 entities avail- 
able to fill these positions, constraint 3 implies that 
SHARON (the lowest ranked entity) can never be the 
Cb, since it will never be the most highly ranked ele- 
ment of Cf(Ui) realized in U2. Therefore, whenever 
LYN is reahzed, the CONTINUATION interpretation 
will place LYN in subject position, thus explaining 
the first two readings of U2. The third reading is 
available because no Cb has yet been established for 
Ui, so that a continuation does not require the re- 
alization of LYN in U2 . Notice that any reading that 

^Future work will discuss center establishment in more de- 
tail, as well as other interactions, e.g., the effect of wa marking. 
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assigns SHARON to the subject position or LYN to 
a non-subject position would produce a retention. 



4.2 Zero Topics 

Another class of ambiguities can result from the op- 
tional assignment of TOPIC to a zero pronomi. We 
propose a topic assignment rule: 

Zero Topic Assignment 

When no continuation transition is avail- 
able, and a zero pronoun in Um represents 
an entity that was the Cb(U„i-i) and if no 
other entity in Um is overtly marked as the 
TOPIC, that zero may be interpreted as the 
TOPIC of Um- 



This fact, which has been overlooked in previous 
treatments of zero pronouns in Japanese, explains 
the interesting contrast between the two discourse 
segments in examples (9) and (10) below. Assume 
in (9) and (10) that TAROO and HANAKO have 
already been under discussion 

(9) 
Un: 

Taroo wa kooen o sanpo-siteita 
SUBJ park walk-around 

"Taroo was walking around the park" 



Cb: TAROO 




Cf: [TAROO, 


PARK] 


subj 


obj 



U„+i: 

Hanako ga yatto mituketa 
SUBJ finally found 

"Hanako finally found (him). " 



Cb: 


TAROO 






Cfl: 


[TAROO, 


HANAKO] 


(C) 




topic/obj 


subj 




Cf2: 


[HANAKO, 


TAROO] 


(R) 




subj 


obj 





yotei o setumeisita 

SUBJ OBJ schedule explained 
He explained the schedule to her. ( Cfl ) 
She explained the schedule to him. (Cf2) 



Cbl: 


TAROO 






Cb2: 


HANAKO 






Cfl: 


[TAROO, 


HANAKO] 


(C) 




subj 


obj 




Cf2: 


[HANAKO, 


TAROO] 


(S-1) 




subj 


obj 





In (9), there are actually two possible Cf lists in 
Un+i; Cf2, which is the only list possible without 
topic ambiguity, represents a retention (R) rather 



than a CONTINUATION (C), thus triggering zero topic 
assignment. The utterance Un-i-i, actually has the 
same meaning for both Cf lists. The ambiguity in 
Un-i-2 is caused by the fact that the hearer simulta- 
neously entertains both of the Cf(Un_|_i). The avail- 
ability of zero topic assignment means that TAROO 
can be the Cp even when TAROO is realized as 
the topic/object. The shift- 1 interpretation results 
from the algorithm's application to Cf2 of Un+i. We 
can test to see if topic ambiguity is actually the dis- 
course phenomenon at work here by contrasting (9) 
with its minimal pair (10), in which overt topic mark- 
ing in Un+i rules out topic ambiguity. 

(10) 

U„: 

Taroo wa kooen o sanpo-siteita 
SUBJ park walk-around 

"Taroo was walking around the park" 



Cb: TAROO 




Cf: [TAROO, 


PARK] 


subj 


obj 


Un + i: 




Hanako-wa 


yatto mituketa 


TOP/SUBJ 


finally found 



"Hanako finally found (him)." 



Cb: TAROO 




Cf: [HANAKO, 


TAROO] (R) 


top/subj 


obj 



Un 





yotei-o setumeisita 
schedule explained 
"She explained the schedule to him" 



Cb: HANAKO 
Cf: [HANAKO, TAROO] 
subj obj 



(SHIFTl) 



^Due to lack of space, we can not discuss the interaction of 
center establishment with zero topic assignment here. 



In (10) the only Cf possible for Un+i is the re- 
tention in the parallel utterance in (9). Given that 
there are 2 zero pronouns in Un-(-2, constraint 3 forces 
a shift. The Hanako explained ... interpretation is 
preferred because it is the more highly ranked SHIFT- 
1 transition. If HANAKO could represent a topic-OBJ 
there would be another equally ranked shift- 1 in- 
terpretation. However, HANAKO can not be a zero 
topic because it was not the Cb of the previous ut- 
terance. 



5 Discussion 

We have demonstrated a computational treatment 
of the resolution of zero pronouns in Japanese. 
Kameyama proposed an analysis of Japanese zero 
pronouns that used centering, but did not distin- 
guish between continuing and retaining, and 
thus re quired a n extra mechanism, i.e. property- 
sharing |Kam85|. Our examples (5), (6) and (7) show 
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that property-sharing is an unnecessary stipulation. 
In addition, there are a number of cases in which 
property-sharing just doesn't work. Our "introduce" 
example (8) illustrates that it is not essential for a 
zero pronoun to share a grammatical function prop- 
erty with its antecedent. In fact property-sharing 
would falsely predict that the Masayo likes Sharon 
interpretation of (8) U2 is not possible, as well as 
falsely predicting the ungrammaticality of examples 
like (11) below. 

(11) 

Hanako wa repooto o kaita. 

report wrote 
"Hanako wrote a report" 

Un + i: 

Oj Taroo ni aini-itta. 

to see- went 

"She went to see Taroo" 

Oi = Hanako [SUB EMPATHY] 

Un+2: 

Taroo wa Oi kibisiku hihansita. 

severely criticized 
"Taroo severely criticized her. " 
Oi = Hanako [nonSUB nonEMPATHY] 

Property-sharing requires that in Un+2, i / 
HANAKO, since the zero carries the properties 
(sub J, empathy) in Un+i, but has the properties 

(nONSUBJ, NONEMPATHY) in Un+2|- But in fact Un+2 

is perfectly acceptable under the intended reading of 
Taroo severely criticized Hanako. Nothing special 
needs to be said about these to get the correct inter- 
pretation using the centering algorithm. 

We have also proposed a notion of topic ambigu- 
ity, which arises from the fact that the grammatical 
function of unexpressed zero arguments is indeter- 
minate. The application of zero topic assignment 
also depends on the centering theory distinction be- 
tween CONTINUING and RETAINING. In addition, the 
centering construct of backward-looking center, Cb, 
gives us a computational way of determining when a 
zero pronoun may be assigned Topic. Topic ambi- 
guity has been ignored in previous analyses, but it 
explains the availability of interpretations that pre- 
vious accounts would predict as ungrammatical. 

This analysis has implications for the design of 
language-independent discourse processing modules. 
We claim that the syntactic factors that affect the 
ranking of the items on the forward center list, Cf, 
will vary from language to language. The ordering 
for Japanese incorporates topic and empathy into 
the Cf ranking, which is a single parameter of the 
centering algorithm. In every other respect, the rules 
and constraints of the centering framework that the 



Kameyama called the Empathy property IDENT. 



centering algorithm implements remain invariant. 
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